CS 4824 Final Project

Author: Eshan Kaul
Team 29

Motivation

Research Question

Can the Indian solar plant AC power production be accuratly forcasted in order to deliver the optimal energy-pricing strategy?

Data Introduction

All the data used in this notebook has been retrieved from Kaggle in the form of a comma separated value file. The data was gathered at two solar power plants in India over a 34 day period in 15 minute intervals. The source contains four data sets, however we analyzed two of the four: a plant generation data file and a plant weather sensor data file. The plant generation file is gathered at the inverter level: each inverter has multiple lines of solar panels attached to it (dim: 68778 rows × 7 columns). The sensor data is gathered at a plant level: single array of sensors optimally placed at the plant (dim: 3182 rows × 6 columns).

Read In Data + EDA

Correlation

Definition: Any statistical relationship, whether causal or not, between two random variables or bivariate data. The Correlation Coefficient or Pearson correlation coefficient is commonly obtained by taking the ratio of the covariance of the two variables in question from the numerical dataset, normalized to the square root of their variances. Mathematically, it is the division of the covariance (joint variability of two random variables) of the two variables by the product of their standard deviations.

$$ \rho X,Y = \text{corr(X,Y)} = \frac{\text{cov}(X,Y)}{\sigma_x \sigma_y} = \frac{E[(X-{\mu}x)(Y-{\mu}y)]}{\sigma_x \sigma_y}$$$$ \text{cov}(X, Y) = E[(X - E[X])(Y - E[Y])]$$

Decision Tree Regressor

The Decision Tree Regressor is a tree-based machine learning model that aims to predict continuous target values based on input features. It recursively divides the input space into non-overlapping regions, and each region is assigned an average target value calculated from the training data. The tree is constructed by selecting the best feature and threshold at each node to minimize the mean squared error (MSE) of the target values.

Algorithm

  1. Initialization: Start with the entire dataset at the root node.
  2. Node splitting: At each node, iterate through all features and potential thresholds to find the best feature and threshold that minimizes the MSE.
  3. Recursion: Recursively split the dataset into two subsets based on the best feature and threshold found in step 2, and create the left and right child nodes.
  4. Stopping criteria: Stop splitting if any of the following conditions is met:
    • The maximum depth of the tree is reached.
    • The number of samples in the node is less than the minimum samples required to split.
  5. Prediction: Assign the mean target value of the samples in a node as the prediction value for that node (leaf node).

Mean Squared Error (MSE)

The MSE is used as a splitting criterion to minimize the impurity in the child nodes. It is calculated as follows:

MSE = $\frac{1}{n} \sum{(y_i - y_{\text{mean}})^2}$

where $n$ is the number of samples in a node, $y_i$ is the target value of the i-th sample, and $y_{\text{mean}}$ is the mean target value of all samples in the node.

The best feature and threshold to split a node are selected by minimizing the weighted sum of the MSEs in the left and right child nodes.

Benefits

Limitations

Gradient Boosting Regressor

Gradient Boosting Regressor is an ensemble learning technique that builds a strong model by combining the predictions of multiple weak models, typically decision trees. It works by iteratively fitting decision trees to the residual errors of the previous model, thereby minimizing the overall loss.

Algorithm

  1. Initialize: Calculate the bias term as the mean of the target variable, and initialize the predictions with this bias term.
  2. Boosting rounds: Perform a given number of boosting rounds (n_estimators), where each round consists of the following steps:
    • Calculate the residuals between the target values and the current predictions.
    • Train a decision tree regressor on the residuals, using a specified maximum depth and minimum number of samples required to split a node.
    • Update the predictions by adding the weighted predictions of the trained decision tree, scaled by a learning rate (shrinkage parameter).
  3. Prediction: Predict the target values for given input features by summing the bias term and the weighted predictions from each decision tree.

Learning Rate

The learning rate, or shrinkage parameter, is a positive scalar that scales the contribution of each decision tree to the overall model. A smaller learning rate results in a more conservative model, reducing the risk of overfitting, but may require more boosting rounds to achieve optimal performance.

Benefits

Limitations

Kernel Partial Least Squares (KPLS) Regression

Kernel Partial Least Squares (KPLS) Regression is an extension of the Partial Least Squares (PLS) Regression that utilizes kernel functions to model complex nonlinear relationships between input and output variables. It combines the feature extraction abilities of kernel methods, such as the Support Vector Machine (SVM), with the dimensionality reduction and regression capabilities of PLS.

Algorithm

  1. Kernel matrix: Compute the kernel matrix K using a chosen kernel function (e.g., RBF kernel) that maps the input features to a higher-dimensional space.
  2. Centering: Center the kernel matrix K and target values Y.
  3. Iterative decomposition: For a given number of components, iteratively extract the latent variables (scores) T and U, and the weight vectors P.
    • Initialize U as the centered target values Y.
    • Compute the score vector T as the product of the centered kernel matrix K and U, and normalize T.
    • Compute the weight vector P as the product of the centered kernel matrix K and T, and normalize P.
    • Update the centered kernel matrix K by subtracting the outer product of T and T times K.
  4. Regression: Calculate the regression coefficients W by solving the linear system T * W = Y.
  5. Prediction: Predict the target values Y_pred using the product of the scores T and the regression coefficients W.

Radial Basis Function (RBF) Kernel

The RBF kernel is a popular choice for nonlinear kernel methods because of its flexibility and smoothness. It is defined as:

$K(x, x') = exp(-γ ||x - x'||^2)$

where x and x' are two input feature vectors, and γ is a positive parameter that controls the shape of the kernel. A larger γ results in a more flexible, localized model, while a smaller γ leads to a smoother, more global model.

Benefits

Limitations